Attribute Classification Using Feature Analysis
نویسندگان
چکیده
The basis of many systems that integrate data from multiple sources is a set of correspondences between source schemata and a target schema. Correspondences express a relationship between sets of source attributes, possibly from multiple sources, and a set of target attributes. Clio is an integration tool that assists users in de ning value correspondences between attributes [1]. In real life scenarios there may be many sources and the source relations may have many attributes. Users can get lost and might miss or be unable to nd some correspondences. Also, in many real life schemata the attribute names reveal little or nothing about the semantics of the data values. Only the data values in the attribute columns can convey the semantic meaning of the attribute. Our work relieves users of the problems of too many attributes and meaningless attribute names, by automatically suggesting correspondences between source and target attributes. For each attribute, we analyze the data values and derive a set of features. The overall feature set forms the characteristic signature of an attribute. There are more likely to be correspondences between attributes with similar signatures than between others. Our results show that a properly chosen small set of domain-independent features can mostly capture structural information of attributes.
منابع مشابه
Feature selection using genetic algorithm for classification of schizophrenia using fMRI data
In this paper we propose a new method for classification of subjects into schizophrenia and control groups using functional magnetic resonance imaging (fMRI) data. In the preprocessing step, the number of fMRI time points is reduced using principal component analysis (PCA). Then, independent component analysis (ICA) is used for further data analysis. It estimates independent components (ICs) of...
متن کاملFisher Discriminant Analysis (FDA), a supervised feature reduction method in seismic object detection
Automatic processes on seismic data using pattern recognition is one of the interesting fields in geophysical data interpretation. One part is the seismic object detection using different supervised classification methods that finally has an output as a probability cube. Object detection process starts with generating a pickset of two classes labeled as object and non-object and then selecting ...
متن کاملFeature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine
We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...
متن کاملAn Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کاملApplying Genetic Algorithm to EEG Signals for Feature Reduction in Mental Task Classification
Brain-Computer interface systems are a new mode of communication which provides a new path between brain and its surrounding by processing EEG signals measured in different mental states. Therefore, choosing suitable features is demanded for a good BCI communication. In this regard, one of the points to be considered is feature vector dimensionality. We present a method of feature reduction us...
متن کاملComparison of Parametric and Non-parametric EEG Feature Extraction Methods in Detection of Pediatric Migraine without Aura
Background: Migraine headache without aura is the most common type of migraine especially among pediatric patients. It has always been a great challenge of migraine diagnosis using quantitative electroencephalography measurements through feature classification. It has been proven that different feature extraction and classification methods vary in terms of performance regarding detection and di...
متن کامل